AutoML using H2O library

H2O AutoML is a powerful tool that automates the process of training and tuning a large selection of machine learning models. It allows users to easily find the best model for their data without having to manually tune hyperparameters or select algorithms. Here’s a detailed guide on how to use H2O AutoML.

1. Import the necessary libraries and initialize the H2O cluster:

import h2o
from h2o.automl import H2OAutoML

from h2o.estimators.random_forest import H2ORandomForestEstimator #RF model
from h2o.estimators.glm import H2OGeneralizedLinearEstimator #Logistic Regression



# Initialize the H2O cluster
h2o.init()
Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
; Java HotSpot(TM) 64-Bit Server VM (build 23+37-2369, mixed mode, sharing)
  Starting server from C:\Users\Fimran\anaconda3\Lib\site-packages\h2o\backend\bin\h2o.jar
  Ice root: C:\Users\Fimran\AppData\Local\Temp\tmpbm5bu07r
  JVM stdout: C:\Users\Fimran\AppData\Local\Temp\tmpbm5bu07r\h2o_Fimran_started_from_python.out
  JVM stderr: C:\Users\Fimran\AppData\Local\Temp\tmpbm5bu07r\h2o_Fimran_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
H2O_cluster_uptime: 02 secs
H2O_cluster_timezone: Asia/Kolkata
H2O_data_parsing_timezone: UTC
H2O_cluster_version: 3.46.0.5
H2O_cluster_version_age: 1 month and 11 days
H2O_cluster_name: H2O_from_python_Fimran_1l8p57
H2O_cluster_total_nodes: 1
H2O_cluster_free_memory: 3.907 Gb
H2O_cluster_total_cores: 12
H2O_cluster_allowed_cores: 12
H2O_cluster_status: locked, healthy
H2O_connection_url: http://127.0.0.1:54321
H2O_connection_proxy: {"http": null, "https": null}
H2O_internal_security: False
Python_version: 3.11.5 final

2. Load the dataset into an H2OFrame:

# Import a sample dataset
data = h2o.import_file("cust_master.csv")

# Display dataset summary
data.describe()
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
Rows:6990
Cols:21
C1 custid debtinc creddebt othdebt preloan veh house selfemp account deposit emp address branch ref age gender ms child zone bad
type int int real real real int int int int int int int int int int int int int int int int
mins 1.0 1.0 0.69 0.05 0.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
mean 3495.5 3495.5 12.756690987124461 3.0376380543633767 4.056942775393419 1.4967095851216023 1.5048640915593705 1.492274678111588 1.4957081545064377 1.4948497854077254 1.509585121602289 8.892131616595139 8.773533619456368 1.4861230329041488 1.5010014306151644 1.978826895565093 1.4989985693848356 1.4974248927038627 1.249928469241774 10.395422031473533 0.1323319027181688
maxs 6990.0 6990.0 45.33 22.49 28.74 2.0 2.0 2.0 2.0 2.0 2.0 32.0 35.0 2.0 2.0 3.0 2.0 2.0 2.0 20.0 1.0
sigma 2017.9835232231208 2017.9835232231208 6.986628247480894 2.2828533059578655 3.33913367973861 0.5000249414952199 0.5000121075779711 0.49997608078686767 0.5000173476222526 0.500009241905527 0.49994387964527565 6.674586326055574 6.832234286280297 0.49984314812487557 0.5000347662810315 0.8122388311372561 0.5000347662810315 0.5000291375211107 0.43300236980701956 5.780590936825641 0.3388754917947687
zeros 0 0 0 0 4 0 0 0 0 0 0 303 239 0 0 0 0 0 0 0 6065
missing 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1.0 1.0 12.34 13.26 5.88 2.0 2.0 1.0 2.0 1.0 2.0 18.0 12.0 1.0 2.0 2.0 1.0 2.0 2.0 7.0 1.0
1 2.0 2.0 18.65 2.12 5.13 1.0 2.0 1.0 1.0 2.0 1.0 11.0 7.0 1.0 2.0 2.0 2.0 1.0 1.0 5.0 0.0
2 3.0 3.0 7.22 3.31 3.65 1.0 1.0 2.0 1.0 1.0 1.0 16.0 15.0 1.0 1.0 1.0 1.0 2.0 1.0 15.0 0.0
3 4.0 4.0 6.15 2.95 2.34 1.0 1.0 2.0 1.0 1.0 2.0 15.0 14.0 2.0 1.0 2.0 1.0 1.0 1.0 3.0 0.0
4 5.0 5.0 20.64 2.67 4.07 2.0 2.0 1.0 1.0 1.0 1.0 2.0 1.0 2.0 2.0 2.0 2.0 1.0 1.0 20.0 0.0
5 6.0 6.0 12.44 3.06 2.57 2.0 2.0 2.0 1.0 1.0 2.0 5.0 5.0 1.0 2.0 1.0 2.0 1.0 1.0 5.0 0.0
6 7.0 7.0 34.21 6.51 17.62 2.0 2.0 1.0 2.0 2.0 2.0 21.0 10.0 2.0 2.0 2.0 1.0 2.0 1.0 3.0 0.0
7 8.0 8.0 8.27 2.38 3.09 1.0 2.0 1.0 1.0 1.0 2.0 12.0 12.0 2.0 1.0 1.0 2.0 1.0 1.0 1.0 0.0
8 9.0 9.0 26.67 3.41 4.22 2.0 2.0 2.0 2.0 2.0 2.0 3.0 4.0 2.0 2.0 2.0 1.0 2.0 1.0 8.0 0.0
9 10.0 10.0 19.81 2.81 4.06 2.0 2.0 1.0 1.0 1.0 1.0 1.0 13.0 1.0 2.0 2.0 1.0 2.0 1.0 17.0 0.0
[6990 rows x 21 columns]

Converting Datatypes

cols_to_factor = ["bad", "preloan", "veh", "house", "selfemp", "account", "deposit", 
                  "branch", "ref", "age", "gender", "ms", "child", "zone"]

# Convert the specified columns to factors using asfactor()
for col in cols_to_factor:
    if col in data.columns:
        data[col] = data[col].asfactor()
data.describe()
Rows:6990
Cols:21
C1 custid debtinc creddebt othdebt preloan veh house selfemp account deposit emp address branch ref age gender ms child zone bad
type int int real real real enum enum enum enum enum enum int int enum enum enum enum enum enum enum enum
mins 1.0 1.0 0.69 0.05 0.0 0.0 0.0
mean 3495.5 3495.5 12.756690987124461 3.0376380543633767 4.056942775393419 8.892131616595139 8.773533619456368
maxs 6990.0 6990.0 45.33 22.49 28.74 32.0 35.0
sigma 2017.9835232231208 2017.9835232231208 6.986628247480894 2.2828533059578655 3.33913367973861 6.674586326055574 6.832234286280297
zeros 0 0 0 0 4 303 239
missing 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1.0 1.0 12.34 13.26 5.88 2 2 1 2 1 2 18.0 12.0 1 2 2 1 2 2 7 1
1 2.0 2.0 18.65 2.12 5.13 1 2 1 1 2 1 11.0 7.0 1 2 2 2 1 1 5 0
2 3.0 3.0 7.22 3.31 3.65 1 1 2 1 1 1 16.0 15.0 1 1 1 1 2 1 15 0
3 4.0 4.0 6.15 2.95 2.34 1 1 2 1 1 2 15.0 14.0 2 1 2 1 1 1 3 0
4 5.0 5.0 20.64 2.67 4.07 2 2 1 1 1 1 2.0 1.0 2 2 2 2 1 1 20 0
5 6.0 6.0 12.44 3.06 2.57 2 2 2 1 1 2 5.0 5.0 1 2 1 2 1 1 5 0
6 7.0 7.0 34.21 6.51 17.62 2 2 1 2 2 2 21.0 10.0 2 2 2 1 2 1 3 0
7 8.0 8.0 8.27 2.38 3.09 1 2 1 1 1 2 12.0 12.0 2 1 1 2 1 1 1 0
8 9.0 9.0 26.67 3.41 4.22 2 2 2 2 2 2 3.0 4.0 2 2 2 1 2 1 8 0
9 10.0 10.0 19.81 2.81 4.06 2 2 1 1 1 1 1.0 13.0 1 2 2 1 2 1 17 0
[6990 rows x 21 columns]

3. Split the dataset into training and testing sets:

# Split the dataset into training and testing sets
train, test = data.split_frame(ratios=[.8], seed=42)

# Define target and features
target = "bad"
features = data.columns
features.remove(target)
features.remove("C1")  # Remove non-numeric column
features.remove("custid")  # Remove non-numeric column

List of Features/Independent Variables

features #Independent variables.
['debtinc',
 'creddebt',
 'othdebt',
 'preloan',
 'veh',
 'house',
 'selfemp',
 'account',
 'deposit',
 'emp',
 'address',
 'branch',
 'ref',
 'age',
 'gender',
 'ms',
 'child',
 'zone']

4. Train a model using H2O’s AutoML:

H2O AutoML performs a comprehensive set of preprocessing and modeling steps automatically by default. Here’s an overview of the key operations it performs:

1. Preprocessing: - Missing Value Handling: Automatically imputes missing values. For numeric columns, it uses mean/mode imputation. For categorical columns, it uses the most frequent value.

  • Categorical Encoding: Converts categorical features using one of several encoding techniques: One-Hot Encoding for tree-based models. Target Encoding or Label Encoding for other models (e.g., GLMs).

  • Scaling and Normalization: Not explicitly required by the user, but certain algorithms (like GLM) may standardize or normalize the data as needed.

  • Outlier Detection: Does not explicitly remove or handle outliers, but robust algorithms like tree-based models are used.

  • Text Data Handling: H2O AutoML includes support for text features and will automatically generate relevant features (using techniques like TF-IDF).

2. Model Training: Algorithms Used: By default, H2O AutoML trains and ensembles several algorithms, including: - Gradient Boosting Machines (GBM)

  • Random Forest (RF)

  • Generalized Linear Models (GLM)

  • Extremely Randomized Trees (XRT)

  • Deep Learning models (Neural Networks)

  • Stacked Ensembles (Best performing models are combined into an ensemble)

  • Stacked Ensembles: H2O AutoML automatically creates two ensembles:

  1. Best Of Family: Uses the top-performing models from each algorithm family.
  2. All Models Ensemble: Combines all models to improve performance.

3. Hyperparameter Tuning: H2O AutoML performs internal hyperparameter tuning using random grid search or Bayesian optimization, but it does not require users to set up hyperparameters. This tuning improves the performance of individual models.

4. Cross-Validation: Automatically performs cross-validation to ensure that model performance is reliable. It uses k-fold cross-validation by default.

5. Model Selection: H2O AutoML ranks the models based on performance metrics (e.g., AUC for classification or RMSE for regression) and selects the best models to form an ensemble.

6. Leaderboard Creation: Automatically generates a leaderboard that ranks models based on the chosen evaluation metric. You can choose to focus on accuracy, AUC, log loss, or other metrics depending on the problem type.

8. Early Stopping: Implements early stopping for models to prevent overfitting and save computational time.

9. Model Explainability: H2O AutoML provides model explainability tools such as:

  • SHAP (Shapley) Values: To understand feature importance.

  • Partial Dependence Plots (PDPs): For visualizing the relationship between features and predictions.

  • Permutation-based Variable Importance: To rank features by their impact on model performance.

# Initialize H2OAutoML
aml = H2OAutoML(max_models=20, seed=42,max_runtime_secs = 600)

# Train the model
aml.train(x=features, y=target, training_frame=train)
AutoML progress: |
11:24:24.444: AutoML: XGBoost is not available; skipping it.

███████████████████████████████████████████████████████████████| (done) 100%
Model Details
=============
H2OGradientBoostingEstimator : Gradient Boosting Machine
Model Key: GBM_5_AutoML_1_20241009_112424
Model Summary:
number_of_trees number_of_internal_trees model_size_in_bytes min_depth max_depth mean_depth min_leaves max_leaves mean_leaves
41.0 41.0 32538.0 6.0 6.0 6.0 46.0 64.0 58.536587
ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.05798879407987211
RMSE: 0.24080862542664894
LogLoss: 0.20296083091017397
Mean Per-Class Error: 0.11968677750961834
AUC: 0.9691700270222204
AUCPR: 0.8590785913049259
Gini: 0.9383400540444409
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.2811249925833942
0 1 Error Rate
0 4692.0 169.0 0.0348 (169.0/4861.0)
1 151.0 587.0 0.2046 (151.0/738.0)
Total 4843.0 756.0 0.0572 (320.0/5599.0)
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
max f1 0.2811250 0.7858099 172.0
max f2 0.2221599 0.8153266 205.0
max f0point5 0.3503162 0.8129667 141.0
max accuracy 0.3119006 0.9451688 157.0
max precision 0.8546170 1.0 0.0
max recall 0.0545966 1.0 335.0
max specificity 0.8546170 1.0 0.0
max absolute_mcc 0.2811250 0.7529121 172.0
max min_per_class_accuracy 0.2030941 0.8991977 218.0
max mean_per_class_accuracy 0.1741540 0.9018485 235.0
max tns 0.8546170 4861.0 0.0
max fns 0.8546170 736.0 0.0
max fps 0.0071014 4861.0 399.0
max tps 0.0545966 738.0 335.0
max tnr 0.8546170 1.0 0.0
max fnr 0.8546170 0.9972900 0.0
max fpr 0.0071014 1.0 399.0
max tpr 0.0545966 1.0 335.0
Gains/Lift Table: Avg response rate: 13.18 %, avg score: 13.20 %
group cumulative_data_fraction lower_threshold lift cumulative_lift response_rate score cumulative_response_rate cumulative_score capture_rate cumulative_capture_rate gain cumulative_gain kolmogorov_smirnov
1 0.0100018 0.7174637 7.5867209 7.5867209 1.0 0.7754809 1.0 0.7754809 0.0758808 0.0758808 658.6720867 658.6720867 0.0758808
2 0.0200036 0.6458214 7.4512437 7.5189823 0.9821429 0.6804975 0.9910714 0.7279892 0.0745257 0.1504065 645.1243709 651.8982288 0.1502008
3 0.0300054 0.5898271 7.1802894 7.4060847 0.9464286 0.6223356 0.9761905 0.6927713 0.0718157 0.2222222 618.0289392 640.6084656 0.2213993
4 0.0400071 0.5501982 7.3157666 7.3835051 0.9642857 0.5707757 0.9732143 0.6622724 0.0731707 0.2953930 631.5766551 638.3505130 0.2941586
5 0.0500089 0.5054288 6.9093351 7.2886711 0.9107143 0.5277966 0.9607143 0.6353773 0.0691057 0.3644986 590.9335075 628.8671119 0.3622357
6 0.1000179 0.3457525 5.7984224 6.5435467 0.7642857 0.4181940 0.8625 0.5267856 0.2899729 0.6544715 479.8422377 554.3546748 0.6386312
7 0.1500268 0.2615462 3.3869290 5.4913408 0.4464286 0.2995108 0.7238095 0.4510273 0.1693767 0.8238482 238.6928959 449.1340818 0.7761214
8 0.2000357 0.2068014 1.4089624 4.4707462 0.1857143 0.2321835 0.5892857 0.3963164 0.0704607 0.8943089 40.8962447 347.0746225 0.7996782
9 0.3000536 0.1389303 0.7857675 3.2424200 0.1035714 0.1699834 0.4273810 0.3208721 0.0785908 0.9728997 -21.4232482 224.2419990 0.7749981
10 0.4000714 0.0966537 0.1490249 2.4690712 0.0196429 0.1161553 0.3254464 0.2696929 0.0149051 0.9878049 -85.0975126 146.9071211 0.6769635
11 0.5000893 0.0686251 0.0948340 1.9942238 0.0125 0.0814035 0.2628571 0.2320350 0.0094851 0.9972900 -90.5165989 99.4223771 0.5726860
12 0.5999286 0.0486535 0.0271439 1.6668651 0.0035778 0.0581704 0.2197082 0.2031007 0.0027100 1.0 -97.2856097 66.6865138 0.4608105
13 0.6999464 0.0342953 0.0 1.4286808 0.0 0.0407263 0.1883133 0.1798984 0.0 1.0 -100.0 42.8680786 0.3456079
14 0.7999643 0.0236381 0.0 1.2500558 0.0 0.0286057 0.1647689 0.1609826 0.0 1.0 -100.0 25.0055816 0.2304053
15 0.8999821 0.0157506 0.0 1.1111332 0.0 0.0193955 0.1464576 0.1452476 0.0 1.0 -100.0 11.1133161 0.1152026
16 1.0 0.0060067 0.0 1.0 0.0 0.0124270 0.1318093 0.1319632 0.0 1.0 -100.0 0.0 0.0
ModelMetricsBinomial: gbm
** Reported on cross-validation data. **

MSE: 0.09835941427789067
RMSE: 0.31362304487695203
LogLoss: 0.3218450588933127
Mean Per-Class Error: 0.28599887161183896
AUC: 0.7991839813481452
AUCPR: 0.3688428713689448
Gini: 0.5983679626962903
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.1569836755858861
0 1 Error Rate
0 3826.0 1035.0 0.2129 (1035.0/4861.0)
1 265.0 473.0 0.3591 (265.0/738.0)
Total 4091.0 1508.0 0.2322 (1300.0/5599.0)
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
max f1 0.1569837 0.4211932 234.0
max f2 0.0916476 0.5613871 290.0
max f0point5 0.3036292 0.3990915 144.0
max accuracy 0.6064466 0.8733702 35.0
max precision 0.8628242 1.0 0.0
max recall 0.0058842 1.0 399.0
max specificity 0.8628242 1.0 0.0
max absolute_mcc 0.1424608 0.3301183 245.0
max min_per_class_accuracy 0.1250472 0.7249322 260.0
max mean_per_class_accuracy 0.1288552 0.7266559 257.0
max tns 0.8628242 4861.0 0.0
max fns 0.8628242 736.0 0.0
max fps 0.0058842 4861.0 399.0
max tps 0.0058842 738.0 399.0
max tnr 0.8628242 1.0 0.0
max fnr 0.8628242 0.9972900 0.0
max fpr 0.0058842 1.0 399.0
max tpr 0.0058842 1.0 399.0
Gains/Lift Table: Avg response rate: 13.18 %, avg score: 12.28 %
group cumulative_data_fraction lower_threshold lift cumulative_lift response_rate score cumulative_response_rate cumulative_score capture_rate cumulative_capture_rate gain cumulative_gain kolmogorov_smirnov
1 0.0100018 0.6149146 5.2836092 5.2836092 0.6964286 0.6863950 0.6964286 0.6863950 0.0528455 0.0528455 428.3609175 428.3609175 0.0493483
2 0.0200036 0.5322130 3.9288376 4.6062234 0.5178571 0.5748791 0.6071429 0.6306370 0.0392954 0.0921409 292.8837592 360.6223384 0.0830893
3 0.0300054 0.4813727 2.3031117 3.8385195 0.3035714 0.5052498 0.5059524 0.5888413 0.0230352 0.1151762 130.3111692 283.8519486 0.0981015
4 0.0400071 0.4462272 3.1159746 3.6578833 0.4107143 0.4646869 0.4821429 0.5578027 0.0311653 0.1463415 211.5974642 265.7883275 0.1224781
5 0.0500089 0.4137505 3.3869290 3.6036924 0.4464286 0.4290785 0.475 0.5320578 0.0338753 0.1802168 238.6928959 260.3692412 0.1499761
6 0.1000179 0.3108010 2.7908295 3.1972609 0.3678571 0.3588246 0.4214286 0.4454412 0.1395664 0.3197832 179.0829462 219.7260937 0.2531302
7 0.1500268 0.2508502 1.9779665 2.7908295 0.2607143 0.2773215 0.3678571 0.3894013 0.0989160 0.4186992 97.7966512 179.0829462 0.3094624
8 0.2000357 0.2055743 2.0863482 2.6147092 0.275 0.2260367 0.3446429 0.3485601 0.1043360 0.5230352 108.6348238 161.4709156 0.3720375
9 0.3000536 0.1401051 1.6392736 2.2895640 0.2160714 0.1698332 0.3017857 0.2889845 0.1639566 0.6869919 63.9273616 128.9563976 0.4456835
10 0.4000714 0.0984932 0.9212447 1.9474842 0.1214286 0.1179712 0.2566964 0.2462312 0.0921409 0.7791328 -7.8755323 94.7484151 0.4366107
11 0.5000893 0.0704701 0.9212447 1.7422363 0.1214286 0.0840820 0.2296429 0.2138013 0.0921409 0.8712737 -7.8755323 74.2236256 0.4275379
12 0.5999286 0.0489041 0.6378817 1.5584511 0.0840787 0.0588823 0.2054183 0.1880199 0.0636856 0.9349593 -36.2118281 55.8451146 0.3858954
13 0.6999464 0.0339962 0.3657883 1.3880273 0.0482143 0.0411236 0.1829548 0.1670294 0.0365854 0.9715447 -63.4211672 38.8027268 0.3128325
14 0.7999643 0.0228108 0.1896680 1.2381989 0.025 0.0281985 0.1632061 0.1496717 0.0189702 0.9905149 -81.0331978 23.8198918 0.2194801
15 0.8999821 0.0151734 0.0812863 1.1096276 0.0107143 0.0187586 0.1462592 0.1351229 0.0081301 0.9986450 -91.8713705 10.9627561 0.1136419
16 1.0 0.0048037 0.0135477 1.0 0.0017857 0.0115250 0.1318093 0.1227609 0.0013550 1.0 -98.6452284 0.0 0.0
Cross-Validation Metrics Summary:
mean sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
accuracy 0.7928123 0.0463139 0.7553572 0.7830357 0.8607143 0.8160715 0.7488830
aic nan 0.0 nan nan nan nan nan
auc 0.7984324 0.0087963 0.7880711 0.8083007 0.7956054 0.8068472 0.7933376
err 0.2071877 0.0463139 0.2446428 0.2169643 0.1392857 0.1839286 0.2511171
err_count 232.0 51.81216 274.0 243.0 156.0 206.0 281.0
f0point5 0.378579 0.0390662 0.3739246 0.3645433 0.4466859 0.3575077 0.3502335
f1 0.4331958 0.0181995 0.452 0.4387991 0.4428571 0.4046243 0.4276986
f2 0.515326 0.0587197 0.5712841 0.5510441 0.4390935 0.4660453 0.5491632
lift_top_group 4.822247 0.4419564 4.580777 5.221445 5.258216 4.839506 4.2112904
loglikelihood nan 0.0 nan nan nan nan nan
--- --- --- --- --- --- --- ---
mcc 0.3425887 0.0208215 0.3530558 0.3540724 0.3633323 0.3125524 0.3299305
mean_per_class_accuracy 0.7096006 0.0244627 0.7295934 0.7323725 0.6794550 0.6876857 0.7188964
mean_per_class_error 0.2903994 0.0244627 0.2704066 0.2676274 0.3205450 0.3123143 0.2811036
mse 0.0986880 0.0069054 0.1086909 0.0947773 0.0947164 0.0922721 0.1029835
pr_auc 0.3680402 0.0139760 0.3623444 0.3834353 0.3715635 0.3470756 0.3757821
precision 0.3512853 0.0554632 0.3353116 0.3275862 0.4492754 0.3317536 0.3125
r2 0.1372044 0.0097169 0.1259628 0.1490389 0.1444725 0.1295643 0.1369836
recall 0.5980290 0.1141727 0.6932516 0.6643357 0.4366197 0.5185185 0.6774194
rmse 0.3139953 0.0108989 0.3296831 0.3078592 0.3077603 0.3037633 0.3209104
specificity 0.8211722 0.0683314 0.7659352 0.8004094 0.9222904 0.8568528 0.7603735
[22 rows x 8 columns]
Scoring History:
timestamp duration number_of_trees training_rmse training_logloss training_auc training_pr_auc training_lift training_classification_error
2024-10-09 11:25:56 5.315 sec 0.0 0.3382833 0.3898116 0.5 0.1318093 1.0 0.8681907
2024-10-09 11:25:57 5.382 sec 5.0 0.3141975 0.3311806 0.8795053 0.5590947 6.9212190 0.1373460
2024-10-09 11:25:57 5.457 sec 10.0 0.2957483 0.2933720 0.9101026 0.6448950 6.9093351 0.1110913
2024-10-09 11:25:57 5.529 sec 15.0 0.2816278 0.2679429 0.9295040 0.7045951 7.3157666 0.1130559
2024-10-09 11:25:57 5.598 sec 20.0 0.2711035 0.2498535 0.9392502 0.7437255 7.4512437 0.0918021
2024-10-09 11:25:57 5.673 sec 25.0 0.2614576 0.2343159 0.9499067 0.7858177 7.3157666 0.0764422
2024-10-09 11:25:57 5.744 sec 30.0 0.2526424 0.2203327 0.9588501 0.8197262 7.5867209 0.0691195
2024-10-09 11:25:57 5.814 sec 35.0 0.2483659 0.2137913 0.9613795 0.8311050 7.5867209 0.0669762
2024-10-09 11:25:57 5.884 sec 40.0 0.2415671 0.2041347 0.9684352 0.8564471 7.5867209 0.0576889
2024-10-09 11:25:57 5.916 sec 41.0 0.2408086 0.2029608 0.9691700 0.8590786 7.5867209 0.0571531
Variable Importances:
variable relative_importance scaled_importance percentage
zone 287.8631287 1.0 0.2101459
emp 251.5005798 0.8736811 0.1836005
debtinc 195.4653320 0.6790218 0.1426936
creddebt 183.7053986 0.6381693 0.1341086
address 166.7575836 0.5792947 0.1217364
othdebt 135.8368988 0.4718802 0.0991637
branch 24.9527931 0.0866828 0.0182160
age 19.7141228 0.0684844 0.0143917
account 16.2183323 0.0563404 0.0118397
ref 13.8606892 0.0481503 0.0101186
preloan 13.6628952 0.0474632 0.0099742
deposit 10.9796371 0.0381419 0.0080154
selfemp 10.0097427 0.0347726 0.0073073
house 9.6610336 0.0335612 0.0070527
veh 9.5842371 0.0332944 0.0069967
child 7.3031216 0.0253701 0.0053314
ms 6.9090428 0.0240011 0.0050437
gender 5.8408189 0.0202903 0.0042639
[tips]
Use `model.explain()` to inspect the model.
--
Use `h2o.display.toggle_user_tips()` to switch on/off this section.

5. View leaderboard and best model:

The leaderboard is a ranked table of all the models trained during an AutoML run, sorted by performance on a specified evaluation metric (e.g., AUC for classification or RMSE for regression). It helps users easily compare and identify the best models generated by H2O AutoML.

# View the AutoML leaderboard
lb = aml.leaderboard
lb
model_id auc logloss aucpr mean_per_class_error rmse mse
GBM_5_AutoML_1_20241009_112424 0.799184 0.321845 0.368843 0.285999 0.313623 0.0983594
GBM_3_AutoML_1_20241009_112424 0.797276 0.325308 0.359837 0.288801 0.315105 0.0992914
GBM_2_AutoML_1_20241009_112424 0.793164 0.326505 0.351882 0.291428 0.315963 0.0998326
GBM_4_AutoML_1_20241009_112424 0.792472 0.330147 0.34293 0.316141 0.317891 0.101054
GBM_1_AutoML_1_20241009_112424 0.790935 0.3247 0.354853 0.307383 0.314626 0.0989897
GBM_grid_1_AutoML_1_20241009_112424_model_5 0.781697 0.336529 0.334394 0.313841 0.319423 0.102031
GBM_grid_1_AutoML_1_20241009_112424_model_2 0.779318 0.337171 0.342177 0.297754 0.319859 0.102309
GBM_grid_1_AutoML_1_20241009_112424_model_4 0.778936 0.33781 0.334394 0.299292 0.320145 0.102493
XRT_1_AutoML_1_20241009_112424 0.778485 0.335497 0.323896 0.312216 0.319008 0.101766
GBM_grid_1_AutoML_1_20241009_112424_model_1 0.777735 0.333056 0.337218 0.297228 0.31808 0.101175
[15 rows x 7 columns]

Get the best model

best_model = aml.leader
best_model
Model Details
=============
H2OGradientBoostingEstimator : Gradient Boosting Machine
Model Key: GBM_5_AutoML_1_20241009_112424
Model Summary:
number_of_trees number_of_internal_trees model_size_in_bytes min_depth max_depth mean_depth min_leaves max_leaves mean_leaves
41.0 41.0 32538.0 6.0 6.0 6.0 46.0 64.0 58.536587
ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.05798879407987211
RMSE: 0.24080862542664894
LogLoss: 0.20296083091017397
Mean Per-Class Error: 0.11968677750961834
AUC: 0.9691700270222204
AUCPR: 0.8590785913049259
Gini: 0.9383400540444409
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.2811249925833942
0 1 Error Rate
0 4692.0 169.0 0.0348 (169.0/4861.0)
1 151.0 587.0 0.2046 (151.0/738.0)
Total 4843.0 756.0 0.0572 (320.0/5599.0)
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
max f1 0.2811250 0.7858099 172.0
max f2 0.2221599 0.8153266 205.0
max f0point5 0.3503162 0.8129667 141.0
max accuracy 0.3119006 0.9451688 157.0
max precision 0.8546170 1.0 0.0
max recall 0.0545966 1.0 335.0
max specificity 0.8546170 1.0 0.0
max absolute_mcc 0.2811250 0.7529121 172.0
max min_per_class_accuracy 0.2030941 0.8991977 218.0
max mean_per_class_accuracy 0.1741540 0.9018485 235.0
max tns 0.8546170 4861.0 0.0
max fns 0.8546170 736.0 0.0
max fps 0.0071014 4861.0 399.0
max tps 0.0545966 738.0 335.0
max tnr 0.8546170 1.0 0.0
max fnr 0.8546170 0.9972900 0.0
max fpr 0.0071014 1.0 399.0
max tpr 0.0545966 1.0 335.0
Gains/Lift Table: Avg response rate: 13.18 %, avg score: 13.20 %
group cumulative_data_fraction lower_threshold lift cumulative_lift response_rate score cumulative_response_rate cumulative_score capture_rate cumulative_capture_rate gain cumulative_gain kolmogorov_smirnov
1 0.0100018 0.7174637 7.5867209 7.5867209 1.0 0.7754809 1.0 0.7754809 0.0758808 0.0758808 658.6720867 658.6720867 0.0758808
2 0.0200036 0.6458214 7.4512437 7.5189823 0.9821429 0.6804975 0.9910714 0.7279892 0.0745257 0.1504065 645.1243709 651.8982288 0.1502008
3 0.0300054 0.5898271 7.1802894 7.4060847 0.9464286 0.6223356 0.9761905 0.6927713 0.0718157 0.2222222 618.0289392 640.6084656 0.2213993
4 0.0400071 0.5501982 7.3157666 7.3835051 0.9642857 0.5707757 0.9732143 0.6622724 0.0731707 0.2953930 631.5766551 638.3505130 0.2941586
5 0.0500089 0.5054288 6.9093351 7.2886711 0.9107143 0.5277966 0.9607143 0.6353773 0.0691057 0.3644986 590.9335075 628.8671119 0.3622357
6 0.1000179 0.3457525 5.7984224 6.5435467 0.7642857 0.4181940 0.8625 0.5267856 0.2899729 0.6544715 479.8422377 554.3546748 0.6386312
7 0.1500268 0.2615462 3.3869290 5.4913408 0.4464286 0.2995108 0.7238095 0.4510273 0.1693767 0.8238482 238.6928959 449.1340818 0.7761214
8 0.2000357 0.2068014 1.4089624 4.4707462 0.1857143 0.2321835 0.5892857 0.3963164 0.0704607 0.8943089 40.8962447 347.0746225 0.7996782
9 0.3000536 0.1389303 0.7857675 3.2424200 0.1035714 0.1699834 0.4273810 0.3208721 0.0785908 0.9728997 -21.4232482 224.2419990 0.7749981
10 0.4000714 0.0966537 0.1490249 2.4690712 0.0196429 0.1161553 0.3254464 0.2696929 0.0149051 0.9878049 -85.0975126 146.9071211 0.6769635
11 0.5000893 0.0686251 0.0948340 1.9942238 0.0125 0.0814035 0.2628571 0.2320350 0.0094851 0.9972900 -90.5165989 99.4223771 0.5726860
12 0.5999286 0.0486535 0.0271439 1.6668651 0.0035778 0.0581704 0.2197082 0.2031007 0.0027100 1.0 -97.2856097 66.6865138 0.4608105
13 0.6999464 0.0342953 0.0 1.4286808 0.0 0.0407263 0.1883133 0.1798984 0.0 1.0 -100.0 42.8680786 0.3456079
14 0.7999643 0.0236381 0.0 1.2500558 0.0 0.0286057 0.1647689 0.1609826 0.0 1.0 -100.0 25.0055816 0.2304053
15 0.8999821 0.0157506 0.0 1.1111332 0.0 0.0193955 0.1464576 0.1452476 0.0 1.0 -100.0 11.1133161 0.1152026
16 1.0 0.0060067 0.0 1.0 0.0 0.0124270 0.1318093 0.1319632 0.0 1.0 -100.0 0.0 0.0
ModelMetricsBinomial: gbm
** Reported on cross-validation data. **

MSE: 0.09835941427789067
RMSE: 0.31362304487695203
LogLoss: 0.3218450588933127
Mean Per-Class Error: 0.28599887161183896
AUC: 0.7991839813481452
AUCPR: 0.3688428713689448
Gini: 0.5983679626962903
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.1569836755858861
0 1 Error Rate
0 3826.0 1035.0 0.2129 (1035.0/4861.0)
1 265.0 473.0 0.3591 (265.0/738.0)
Total 4091.0 1508.0 0.2322 (1300.0/5599.0)
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
max f1 0.1569837 0.4211932 234.0
max f2 0.0916476 0.5613871 290.0
max f0point5 0.3036292 0.3990915 144.0
max accuracy 0.6064466 0.8733702 35.0
max precision 0.8628242 1.0 0.0
max recall 0.0058842 1.0 399.0
max specificity 0.8628242 1.0 0.0
max absolute_mcc 0.1424608 0.3301183 245.0
max min_per_class_accuracy 0.1250472 0.7249322 260.0
max mean_per_class_accuracy 0.1288552 0.7266559 257.0
max tns 0.8628242 4861.0 0.0
max fns 0.8628242 736.0 0.0
max fps 0.0058842 4861.0 399.0
max tps 0.0058842 738.0 399.0
max tnr 0.8628242 1.0 0.0
max fnr 0.8628242 0.9972900 0.0
max fpr 0.0058842 1.0 399.0
max tpr 0.0058842 1.0 399.0
Gains/Lift Table: Avg response rate: 13.18 %, avg score: 12.28 %
group cumulative_data_fraction lower_threshold lift cumulative_lift response_rate score cumulative_response_rate cumulative_score capture_rate cumulative_capture_rate gain cumulative_gain kolmogorov_smirnov
1 0.0100018 0.6149146 5.2836092 5.2836092 0.6964286 0.6863950 0.6964286 0.6863950 0.0528455 0.0528455 428.3609175 428.3609175 0.0493483
2 0.0200036 0.5322130 3.9288376 4.6062234 0.5178571 0.5748791 0.6071429 0.6306370 0.0392954 0.0921409 292.8837592 360.6223384 0.0830893
3 0.0300054 0.4813727 2.3031117 3.8385195 0.3035714 0.5052498 0.5059524 0.5888413 0.0230352 0.1151762 130.3111692 283.8519486 0.0981015
4 0.0400071 0.4462272 3.1159746 3.6578833 0.4107143 0.4646869 0.4821429 0.5578027 0.0311653 0.1463415 211.5974642 265.7883275 0.1224781
5 0.0500089 0.4137505 3.3869290 3.6036924 0.4464286 0.4290785 0.475 0.5320578 0.0338753 0.1802168 238.6928959 260.3692412 0.1499761
6 0.1000179 0.3108010 2.7908295 3.1972609 0.3678571 0.3588246 0.4214286 0.4454412 0.1395664 0.3197832 179.0829462 219.7260937 0.2531302
7 0.1500268 0.2508502 1.9779665 2.7908295 0.2607143 0.2773215 0.3678571 0.3894013 0.0989160 0.4186992 97.7966512 179.0829462 0.3094624
8 0.2000357 0.2055743 2.0863482 2.6147092 0.275 0.2260367 0.3446429 0.3485601 0.1043360 0.5230352 108.6348238 161.4709156 0.3720375
9 0.3000536 0.1401051 1.6392736 2.2895640 0.2160714 0.1698332 0.3017857 0.2889845 0.1639566 0.6869919 63.9273616 128.9563976 0.4456835
10 0.4000714 0.0984932 0.9212447 1.9474842 0.1214286 0.1179712 0.2566964 0.2462312 0.0921409 0.7791328 -7.8755323 94.7484151 0.4366107
11 0.5000893 0.0704701 0.9212447 1.7422363 0.1214286 0.0840820 0.2296429 0.2138013 0.0921409 0.8712737 -7.8755323 74.2236256 0.4275379
12 0.5999286 0.0489041 0.6378817 1.5584511 0.0840787 0.0588823 0.2054183 0.1880199 0.0636856 0.9349593 -36.2118281 55.8451146 0.3858954
13 0.6999464 0.0339962 0.3657883 1.3880273 0.0482143 0.0411236 0.1829548 0.1670294 0.0365854 0.9715447 -63.4211672 38.8027268 0.3128325
14 0.7999643 0.0228108 0.1896680 1.2381989 0.025 0.0281985 0.1632061 0.1496717 0.0189702 0.9905149 -81.0331978 23.8198918 0.2194801
15 0.8999821 0.0151734 0.0812863 1.1096276 0.0107143 0.0187586 0.1462592 0.1351229 0.0081301 0.9986450 -91.8713705 10.9627561 0.1136419
16 1.0 0.0048037 0.0135477 1.0 0.0017857 0.0115250 0.1318093 0.1227609 0.0013550 1.0 -98.6452284 0.0 0.0
Cross-Validation Metrics Summary:
mean sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
accuracy 0.7928123 0.0463139 0.7553572 0.7830357 0.8607143 0.8160715 0.7488830
aic nan 0.0 nan nan nan nan nan
auc 0.7984324 0.0087963 0.7880711 0.8083007 0.7956054 0.8068472 0.7933376
err 0.2071877 0.0463139 0.2446428 0.2169643 0.1392857 0.1839286 0.2511171
err_count 232.0 51.81216 274.0 243.0 156.0 206.0 281.0
f0point5 0.378579 0.0390662 0.3739246 0.3645433 0.4466859 0.3575077 0.3502335
f1 0.4331958 0.0181995 0.452 0.4387991 0.4428571 0.4046243 0.4276986
f2 0.515326 0.0587197 0.5712841 0.5510441 0.4390935 0.4660453 0.5491632
lift_top_group 4.822247 0.4419564 4.580777 5.221445 5.258216 4.839506 4.2112904
loglikelihood nan 0.0 nan nan nan nan nan
--- --- --- --- --- --- --- ---
mcc 0.3425887 0.0208215 0.3530558 0.3540724 0.3633323 0.3125524 0.3299305
mean_per_class_accuracy 0.7096006 0.0244627 0.7295934 0.7323725 0.6794550 0.6876857 0.7188964
mean_per_class_error 0.2903994 0.0244627 0.2704066 0.2676274 0.3205450 0.3123143 0.2811036
mse 0.0986880 0.0069054 0.1086909 0.0947773 0.0947164 0.0922721 0.1029835
pr_auc 0.3680402 0.0139760 0.3623444 0.3834353 0.3715635 0.3470756 0.3757821
precision 0.3512853 0.0554632 0.3353116 0.3275862 0.4492754 0.3317536 0.3125
r2 0.1372044 0.0097169 0.1259628 0.1490389 0.1444725 0.1295643 0.1369836
recall 0.5980290 0.1141727 0.6932516 0.6643357 0.4366197 0.5185185 0.6774194
rmse 0.3139953 0.0108989 0.3296831 0.3078592 0.3077603 0.3037633 0.3209104
specificity 0.8211722 0.0683314 0.7659352 0.8004094 0.9222904 0.8568528 0.7603735
[22 rows x 8 columns]
Scoring History:
timestamp duration number_of_trees training_rmse training_logloss training_auc training_pr_auc training_lift training_classification_error
2024-10-09 11:25:56 5.315 sec 0.0 0.3382833 0.3898116 0.5 0.1318093 1.0 0.8681907
2024-10-09 11:25:57 5.382 sec 5.0 0.3141975 0.3311806 0.8795053 0.5590947 6.9212190 0.1373460
2024-10-09 11:25:57 5.457 sec 10.0 0.2957483 0.2933720 0.9101026 0.6448950 6.9093351 0.1110913
2024-10-09 11:25:57 5.529 sec 15.0 0.2816278 0.2679429 0.9295040 0.7045951 7.3157666 0.1130559
2024-10-09 11:25:57 5.598 sec 20.0 0.2711035 0.2498535 0.9392502 0.7437255 7.4512437 0.0918021
2024-10-09 11:25:57 5.673 sec 25.0 0.2614576 0.2343159 0.9499067 0.7858177 7.3157666 0.0764422
2024-10-09 11:25:57 5.744 sec 30.0 0.2526424 0.2203327 0.9588501 0.8197262 7.5867209 0.0691195
2024-10-09 11:25:57 5.814 sec 35.0 0.2483659 0.2137913 0.9613795 0.8311050 7.5867209 0.0669762
2024-10-09 11:25:57 5.884 sec 40.0 0.2415671 0.2041347 0.9684352 0.8564471 7.5867209 0.0576889
2024-10-09 11:25:57 5.916 sec 41.0 0.2408086 0.2029608 0.9691700 0.8590786 7.5867209 0.0571531
Variable Importances:
variable relative_importance scaled_importance percentage
zone 287.8631287 1.0 0.2101459
emp 251.5005798 0.8736811 0.1836005
debtinc 195.4653320 0.6790218 0.1426936
creddebt 183.7053986 0.6381693 0.1341086
address 166.7575836 0.5792947 0.1217364
othdebt 135.8368988 0.4718802 0.0991637
branch 24.9527931 0.0866828 0.0182160
age 19.7141228 0.0684844 0.0143917
account 16.2183323 0.0563404 0.0118397
ref 13.8606892 0.0481503 0.0101186
preloan 13.6628952 0.0474632 0.0099742
deposit 10.9796371 0.0381419 0.0080154
selfemp 10.0097427 0.0347726 0.0073073
house 9.6610336 0.0335612 0.0070527
veh 9.5842371 0.0332944 0.0069967
child 7.3031216 0.0253701 0.0053314
ms 6.9090428 0.0240011 0.0050437
gender 5.8408189 0.0202903 0.0042639
[tips]
Use `model.explain()` to inspect the model.
--
Use `h2o.display.toggle_user_tips()` to switch on/off this section.

6. Make predictions:

# Make predictions on the test set
predictions = best_model.predict(test)

# Display predictions
print(predictions)
gbm prediction progress: |███████████████████████████████████████████████████████| (done) 100%
  predict        p0         p1
        0  0.973755  0.0262446
        0  0.987736  0.0122643
        1  0.690831  0.309169
        0  0.889461  0.110539
        1  0.655082  0.344918
        1  0.683939  0.316061
        0  0.887284  0.112716
        0  0.987983  0.012017
        0  0.860587  0.139413
        0  0.89405   0.10595
[1391 rows x 3 columns]

7. Interpretation of Model Predictions

The explain() function in H2O AutoML provides a comprehensive set of visual and textual outputs that help you understand and interpret the models trained during the AutoML run. It generates a series of explainability plots and metrics, enabling you to gain insights into how the models make predictions, what features are most important, and how sensitive the models are to different input variables.

Here’s a breakdown of what the explain() function typically gives you:

1. Model-Specific Explainability:

  • Leader Model Explainability

  • Global Explainability

2. Key Explainability Outputs:

  • Learning Curves:

Learning curves for various models are plotted to show how the performance changes as more data or time is used during training. This can indicate whether a model is overfitting or underfitting.

  • Variable Importance Plot:

This plot shows the most important features (variables) used by the best-performing models. It ranks features by their contribution to the model’s predictive power.

  • SHAP Summary Plot:

SHAP (Shapley Additive Explanations) values are used to explain the contribution of each feature to the model’s predictions. The summary plot provides an overview of how each feature impacts the model, showing both the direction (positive/negative influence) and the magnitude of the impact. It shows the distribution of SHAP values for each feature across all observations.

  • Partial Dependence Plots (PDPs):

These plots show how the predicted outcome changes as a function of one or two features. It helps you understand the relationship between the model’s predictions and specific features. PDPs can be used to interpret how changes in a feature value affect the predictions.

NOTE: The explain() function typically includes both global and local explanations, so you may see similar outputs twice—once for the whole dataset (global) and once for a few individual instances (local).

best_model.explain(test)

Confusion Matrix

Confusion matrix shows a predicted class vs an actual class.

GBM_5_AutoML_1_20241009_112424

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.13482166696691691
0 1 Error Rate
0 895.0 309.0 0.2566 (309.0/1204.0)
1 63.0 124.0 0.3369 (63.0/187.0)
Total 958.0 433.0 0.2674 (372.0/1391.0)

Learning Curve Plot

Learning curve plot shows the loss function/metric dependent on number of iterations or trees for tree-based algorithms. This plot can be useful for determining whether the model overfits.


Variable Importance

The variable importance plot shows the relative importance of the most important variables in the model.


SHAP Summary

SHAP summary plot shows the contribution of the features for each instance (row of data). The sum of the feature contributions and the bias term is equal to the raw prediction of the model, i.e., prediction before applying inverse link function.


Partial Dependence Plots

Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.






Confusion Matrix

Confusion matrix shows a predicted class vs an actual class.

GBM_5_AutoML_1_20241009_112424

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.13482166696691691
0 1 Error Rate
0 895.0 309.0 0.2566 (309.0/1204.0)
1 63.0 124.0 0.3369 (63.0/187.0)
Total 958.0 433.0 0.2674 (372.0/1391.0)

Learning Curve Plot

Learning curve plot shows the loss function/metric dependent on number of iterations or trees for tree-based algorithms. This plot can be useful for determining whether the model overfits.

Variable Importance

The variable importance plot shows the relative importance of the most important variables in the model.

SHAP Summary

SHAP summary plot shows the contribution of the features for each instance (row of data). The sum of the feature contributions and the bias term is equal to the raw prediction of the model, i.e., prediction before applying inverse link function.

Partial Dependence Plots

Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.

8. Model Performance Evaluation

The model_performance() function in H2O AutoML evaluates the performance of a trained model on a specified dataset (e.g., training, validation, or test data). It computes and returns various evaluation metrics that help you understand how well the model is performing in terms of accuracy, precision, recall, AUC, RMSE, and more, depending on the type of problem (classification or regression)

1. Metrics for Classification Models: For binary and multiclass classification models, model_performance() returns metrics like:

  • Accuracy: The ratio of correctly predicted observations to total observations.

  • AUC (Area Under the ROC Curve): Measures the ability of the model to distinguish between classes.

  • Confusion Matrix: Shows the true positive, false positive, true negative, and false negative counts.

  • Log Loss: Penalizes incorrect classifications based on their probability estimates.

  • Precision, Recall, F1-Score: Metrics for classification tasks that focus on positive class prediction accuracy and balance between precision and recall.

  • Gini Coefficient: A normalized version of the AUC.

performance = best_model.model_performance(test_data=test)
performance
ModelMetricsBinomial: gbm
** Reported on test data. **

MSE: 0.10473846220898163
RMSE: 0.3236332217325373
LogLoss: 0.3412375513280259
Mean Per-Class Error: 0.3179353136603479
AUC: 0.7820722369285981
AUCPR: 0.3407591787498713
Gini: 0.5641444738571961
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.17000951139521145
0 1 Error Rate
0 1005.0 199.0 0.1653 (199.0/1204.0)
1 88.0 99.0 0.4706 (88.0/187.0)
Total 1093.0 298.0 0.2063 (287.0/1391.0)
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
max f1 0.1700095 0.4082474 184.0
max f2 0.0764910 0.5654762 280.0
max f0point5 0.3077163 0.3639010 92.0
max accuracy 0.6709660 0.8698778 5.0
max precision 0.8425266 1.0 0.0
max recall 0.0153502 1.0 378.0
max specificity 0.8425266 1.0 0.0
max absolute_mcc 0.0764910 0.3061064 280.0
max min_per_class_accuracy 0.1013523 0.7051495 252.0
max mean_per_class_accuracy 0.0764910 0.7220317 280.0
max tns 0.8425266 1204.0 0.0
max fns 0.8425266 186.0 0.0
max fps 0.0059506 1204.0 399.0
max tps 0.0153502 187.0 378.0
max tnr 0.8425266 1.0 0.0
max fnr 0.8425266 0.9946524 0.0
max fpr 0.0059506 1.0 399.0
max tpr 0.0153502 1.0 378.0
Gains/Lift Table: Avg response rate: 13.44 %, avg score: 11.13 %
group cumulative_data_fraction lower_threshold lift cumulative_lift response_rate score cumulative_response_rate cumulative_score capture_rate cumulative_capture_rate gain cumulative_gain kolmogorov_smirnov
1 0.0100647 0.6115409 4.2505730 4.2505730 0.5714286 0.6871765 0.5714286 0.6871765 0.0427807 0.0427807 325.0572956 325.0572956 0.0377974
2 0.0201294 0.5258748 3.1879297 3.7192513 0.4285714 0.5730079 0.5 0.6300922 0.0320856 0.0748663 218.7929717 271.9251337 0.0632384
3 0.0301941 0.4738463 1.5939649 3.0108225 0.2142857 0.5038153 0.4047619 0.5879999 0.0160428 0.0909091 59.3964859 201.0822511 0.0701450
4 0.0402588 0.4353423 2.1252865 2.7894385 0.2857143 0.4563083 0.375 0.5550770 0.0213904 0.1122995 112.5286478 178.9438503 0.0832297
5 0.0503235 0.3911629 3.1879297 2.8691367 0.4285714 0.4153212 0.3857143 0.5271259 0.0320856 0.1443850 218.7929717 186.9136746 0.1086707
6 0.1006470 0.2863727 2.5503438 2.7097403 0.3428571 0.3343506 0.3642857 0.4307382 0.1283422 0.2727273 155.0343774 170.9740260 0.1988070
7 0.1502516 0.2227587 2.2638921 2.5625464 0.3043478 0.2517451 0.3444976 0.3716448 0.1122995 0.3850267 126.3892118 156.2546376 0.2712394
8 0.2005751 0.1815985 2.0190222 2.4261783 0.2714286 0.2012020 0.3261649 0.3288814 0.1016043 0.4866310 101.9022154 142.6178291 0.3304848
9 0.3005032 0.1215747 1.3913746 2.0820689 0.1870504 0.1477210 0.2799043 0.2686391 0.1390374 0.6256684 39.1374601 108.2068930 0.3756684
10 0.4004313 0.0835414 1.4984034 1.9364145 0.2014388 0.1014471 0.2603232 0.2269161 0.1497326 0.7754011 49.8403416 93.6414520 0.4332084
11 0.5003595 0.0592398 0.8027161 1.7100006 0.1079137 0.0706902 0.2298851 0.1957158 0.0802139 0.8556150 -19.7283884 71.0000615 0.4104322
12 0.6002876 0.0421191 0.8562305 1.5678760 0.1151079 0.0497100 0.2107784 0.1714107 0.0855615 0.9411765 -14.3769476 56.7876013 0.3938343
13 0.7002157 0.0286848 0.4281153 1.4052202 0.0575540 0.0348764 0.1889117 0.1519258 0.0427807 0.9839572 -57.1884738 40.5220218 0.3278110
14 0.8001438 0.0189914 0.1070288 1.2430921 0.0143885 0.0237889 0.1671159 0.1359231 0.0106952 0.9946524 -89.2971185 24.3092091 0.2247189
15 0.9000719 0.0128346 0.0535144 1.1110224 0.0071942 0.0152851 0.1493610 0.1225296 0.0053476 1.0 -94.6485592 11.1022364 0.1154485
16 1.0 0.0059438 0.0 1.0 0.0 0.0101358 0.1344357 0.1112983 0.0 1.0 -100.0 0.0 0.0

Performing Random Forest Model

How it works ?
  • Single Model: When you use H2ORandomForestEstimator, it trains a single RandomForest model with the parameters you specify (like ntrees, max_depth, etc.).

  • Ensemble of Trees: Internally, this RandomForest model builds multiple decision trees (an ensemble of trees), but this is still considered one model. The trees in the ensemble work together to make predictions, but it’s part of the same model instance.

  • No Model Search: It does not automatically try different hyperparameter settings or model types.

Hyperparameters for H2ORandomForestEstimator (RandomForest)
  • ntrees: (default = 50) The number of trees to build in the random forest model. More trees increase the model’s accuracy but also increase computation time.

  • max_depth: (default = 20) The maximum depth of a tree. A higher value allows the tree to grow deeper, which can capture more details of the dataset but might lead to overfitting.

  • min_rows: (default = 1) The minimum number of rows in a node before it becomes a leaf. Larger values lead to smaller trees and prevent overfitting.

  • sample_rate: (default = 0.632) The fraction of the training data used to build each tree. Random sampling with replacement is applied.

  • mtries: (default = -1) The number of features randomly selected at each split. If set to -1, it’s calculated as the square root of the total number of features.

  • col_sample_rate_per_tree: (default = 1.0) Proportion of columns (features) randomly selected per tree.

  • stopping_rounds: (default = 0) Early stopping based on a chosen metric. If the validation performance doesn’t improve after a certain number of rounds, training is stopped.

  • seed: Seed for random number generation, used for reproducibility.


# Initialize RandomForest model
rf_model = H2ORandomForestEstimator(ntrees=500, seed=42)

# Train the RandomForest model
rf_model.train(x=features, y=target, training_frame=train)

# Evaluate model performance on the test set
performance_rf = rf_model.model_performance(test_data=test)
print(performance_rf)
drf Model Build progress: |██████████████████████████████████████████████████████| (done) 100%
ModelMetricsBinomial: drf
** Reported on test data. **

MSE: 0.10017492403407333
RMSE: 0.3165042243542309
LogLoss: 0.32405912531682185
Mean Per-Class Error: 0.26865661698083043
AUC: 0.8006600103043332
AUCPR: 0.3535170012235277
Gini: 0.6013200206086664

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.1671388888855775
       0    1    Error    Rate
-----  ---  ---  -------  --------------
0      879  325  0.2699   (325.0/1204.0)
1      50   137  0.2674   (50.0/187.0)
Total  929  462  0.2696   (375.0/1391.0)

Maximum Metrics: Maximum metrics at their respective thresholds
metric                       threshold    value     idx
---------------------------  -----------  --------  -----
max f1                       0.167139     0.422188  216
max f2                       0.149        0.576019  237
max f0point5                 0.273167     0.402969  113
max accuracy                 0.412        0.869159  29
max precision                0.481        0.6       8
max recall                   0.0408643    1         358
max specificity              0.6385       0.999169  0
max absolute_mcc             0.167139     0.335114  216
max min_per_class_accuracy   0.167139     0.730066  216
max mean_per_class_accuracy  0.152867     0.735707  231
max tns                      0.6385       1203      0
max fns                      0.6385       187       0
max fps                      0            1204      399
max tps                      0.0408643    187       358
max tnr                      0.6385       0.999169  0
max fnr                      0.6385       1         0
max fpr                      0            1         399
max tpr                      0.0408643    1         358

Gains/Lift Table: Avg response rate: 13.44 %, avg score: 13.45 %
group    cumulative_data_fraction    lower_threshold    lift      cumulative_lift    response_rate    score       cumulative_response_rate    cumulative_score    capture_rate    cumulative_capture_rate    gain      cumulative_gain    kolmogorov_smirnov
-------  --------------------------  -----------------  --------  -----------------  ---------------  ----------  --------------------------  ------------------  --------------  -------------------------  --------  -----------------  --------------------
1        0.0100647                   0.4536             3.71925   3.71925            0.5              0.512065    0.5                         0.512065            0.0374332       0.0374332                  271.925   271.925            0.0316192
2        0.0201294                   0.421933           3.71925   3.71925            0.5              0.433514    0.5                         0.47279             0.0374332       0.0748663                  271.925   271.925            0.0632384
3        0.0301941                   0.40198            4.78189   4.07347            0.642857         0.412706    0.547619                    0.452762            0.0481283       0.122995                   378.189   307.347            0.107214
4        0.0402588                   0.373267           2.12529   3.58642            0.285714         0.38636     0.482143                    0.436161            0.0213904       0.144385                   112.529   258.642            0.120299
5        0.0503235                   0.357833           2.65661   3.40046            0.357143         0.365857    0.457143                    0.4221              0.026738        0.171123                   165.661   240.046            0.139562
6        0.100647                    0.302              2.44408   2.92227            0.328571         0.327309    0.392857                    0.374705            0.122995        0.294118                   144.408   192.227            0.22352
7        0.150252                    0.262167           2.69511   2.84727            0.362319         0.281809    0.382775                    0.344036            0.13369         0.427807                   169.511   184.727            0.320665
8        0.200575                    0.232              1.59396   2.53282            0.214286         0.245822    0.340502                    0.319394            0.0802139       0.508021                   59.3965   153.282            0.355197
9        0.300503                    0.178              1.55192   2.20664            0.208633         0.203461    0.296651                    0.280842            0.15508         0.663102                   55.1918   120.664            0.418916
10       0.400431                    0.141              1.28435   1.97648            0.172662         0.159182    0.265709                    0.250482            0.128342        0.791444                   28.4346   97.6478            0.451743
11       0.500359                    0.105762           0.642173  1.71               0.0863309        0.121879    0.229885                    0.224798            0.0641711       0.855615                   -35.7827  71.0001            0.410432
12       0.603882                    0.072              0.8265    1.55854            0.111111         0.0868037   0.209524                    0.201142            0.0855615       0.941176                   -17.35    55.8543            0.389681
13       0.706686                    0.05               0.41614   1.39235            0.0559441        0.0591081   0.187182                    0.18048             0.0427807       0.983957                   -58.386   39.2355            0.320336
14       0.802301                    0.031              0.167786  1.24642            0.0225564        0.0398253   0.167563                    0.163717            0.0160428       1                          -83.2214  24.6416            0.228405
15       0.900791                    0.016              0         1.11014            0                0.0226126   0.149242                    0.148289            0               1                          -100      11.0136            0.114618
16       1                           0                  0         1                  0                0.00884212  0.134436                    0.134455            0               1                          -100      0                  0
# Initialize RandomForest model
rf_model = H2ORandomForestEstimator(ntrees=100, seed=42,max_depth = 10)

# Train the RandomForest model
rf_model.train(x=features, y=target, training_frame=train)

# Evaluate model performance on the test set
performance_rf = rf_model.model_performance(test_data=test)
print(performance_rf)
drf Model Build progress: |██████████████████████████████████████████████████████| (done) 100%
ModelMetricsBinomial: drf
** Reported on test data. **

MSE: 0.10091330261450727
RMSE: 0.3176685420599705
LogLoss: 0.3264377626473603
Mean Per-Class Error: 0.314977259402704
AUC: 0.7963739406967861
AUCPR: 0.3436652379070182
Gini: 0.5927478813935723

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.21877322517335418
       0     1    Error    Rate
-----  ----  ---  -------  --------------
0      1025  179  0.1487   (179.0/1204.0)
1      90    97   0.4813   (90.0/187.0)
Total  1115  276  0.1934   (269.0/1391.0)

Maximum Metrics: Maximum metrics at their respective thresholds
metric                       threshold    value     idx
---------------------------  -----------  --------  -----
max f1                       0.218773     0.419006  135
max f2                       0.102913     0.573601  261
max f0point5                 0.308555     0.420032  68
max accuracy                 0.425323     0.866283  13
max precision                0.425323     0.529412  13
max recall                   0.0288167    1         355
max specificity              0.5658       0.999169  0
max absolute_mcc             0.163727     0.327726  192
max min_per_class_accuracy   0.163727     0.721925  192
max mean_per_class_accuracy  0.163727     0.725996  192
max tns                      0.5658       1203      0
max fns                      0.5658       186       0
max fps                      0.00110944   1204      399
max tps                      0.0288167    187       355
max tnr                      0.5658       0.999169  0
max fnr                      0.5658       0.994652  0
max fpr                      0.00110944   1         399
max tpr                      0.0288167    1         355

Gains/Lift Table: Avg response rate: 13.44 %, avg score: 12.95 %
group    cumulative_data_fraction    lower_threshold    lift      cumulative_lift    response_rate    score       cumulative_response_rate    cumulative_score    capture_rate    cumulative_capture_rate    gain      cumulative_gain    kolmogorov_smirnov
-------  --------------------------  -----------------  --------  -----------------  ---------------  ----------  --------------------------  ------------------  --------------  -------------------------  --------  -----------------  --------------------
1        0.0100647                   0.429442           3.18793   3.18793            0.428571         0.476572    0.428571                    0.476572            0.0320856       0.0320856                  218.793   218.793            0.025441
2        0.0201294                   0.384879           3.18793   3.18793            0.428571         0.407174    0.428571                    0.441873            0.0320856       0.0641711                  218.793   218.793            0.0508821
3        0.0301941                   0.37253            3.71925   3.36504            0.5              0.377485    0.452381                    0.42041             0.0374332       0.101604                   271.925   236.504            0.0825013
4        0.0402588                   0.358368           3.71925   3.45359            0.5              0.3652      0.464286                    0.406608            0.0374332       0.139037                   271.925   245.359            0.11412
5        0.0503235                   0.345409           2.65661   3.29419            0.357143         0.351867    0.442857                    0.395659            0.026738        0.165775                   165.661   229.419            0.133383
6        0.100647                    0.286168           2.76287   3.02853            0.371429         0.312724    0.407143                    0.354192            0.139037        0.304813                   176.287   202.853            0.235876
7        0.150252                    0.245422           1.94048   2.66932            0.26087          0.262668    0.358852                    0.323976            0.0962567       0.40107                    94.0479   166.932            0.289774
8        0.200575                    0.216816           2.33782   2.58615            0.314286         0.230699    0.34767                     0.300573            0.117647        0.518717                   133.782   158.615            0.367554
9        0.300503                    0.172712           1.44489   2.20664            0.194245         0.193629    0.296651                    0.26501             0.144385        0.663102                   44.4889   120.664            0.418916
10       0.400431                    0.138752           1.1238    1.93641            0.151079         0.156411    0.260323                    0.237909            0.112299        0.775401                   12.3803   93.6415            0.433208
11       0.500359                    0.103187           1.07029   1.76344            0.143885         0.120909    0.237069                    0.214543            0.106952        0.882353                   7.02882   76.3438            0.441323
12       0.600288                    0.0751437          0.588658  1.56788            0.0791367        0.0887597   0.210778                    0.193604            0.0588235       0.941176                   -41.1342  56.7876            0.393834
13       0.700216                    0.046618           0.374601  1.39758            0.0503597        0.0600448   0.187885                    0.174544            0.0374332       0.97861                    -62.5399  39.7583            0.321633
14       0.800144                    0.0324072          0.107029  1.23641            0.0143885        0.0391986   0.166217                    0.157641            0.0106952       0.989305                   -89.2971  23.6409            0.218541
15       0.900072                    0.0168867          0.107029  1.11102            0.0143885        0.0242023   0.149361                    0.142826            0.0106952       1                          -89.2971  11.1022            0.115449
16       1                           0.0010809          0         1                  0                0.00977069  0.134436                    0.12953             0               1                          -100      0                  0
# Initialize RandomForest model
rf_model = H2ORandomForestEstimator(ntrees=200, max_depth=30, min_rows=5, sample_rate=0.7, mtries=-1, seed=42)

# Train the RandomForest model
rf_model.train(x=features, y=target, training_frame=train)

# Evaluate model performance on the test set
performance_rf = rf_model.model_performance(test_data=test)
print(performance_rf)
drf Model Build progress: |██████████████████████████████████████████████████████| (done) 100%
ModelMetricsBinomial: drf
** Reported on test data. **

MSE: 0.10154071650982005
RMSE: 0.3186545410155331
LogLoss: 0.328994136228148
Mean Per-Class Error: 0.29792181143070334
AUC: 0.7927052427736422
AUCPR: 0.3371696215765514
Gini: 0.5854104855472844

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.21098592726886273
       0     1    Error    Rate
-----  ----  ---  -------  --------------
0      1021  183  0.152    (183.0/1204.0)
1      83    104  0.4439   (83.0/187.0)
Total  1104  287  0.1912   (266.0/1391.0)

Maximum Metrics: Maximum metrics at their respective thresholds
metric                       threshold    value     idx
---------------------------  -----------  --------  -----
max f1                       0.210986     0.438819  134
max f2                       0.102119     0.56683   256
max f0point5                 0.246928     0.403309  101
max accuracy                 0.395569     0.86844   18
max precision                0.395569     0.590909  18
max recall                   0.0384282    1         342
max specificity              0.533251     0.999169  0
max absolute_mcc             0.210986     0.340689  134
max min_per_class_accuracy   0.148348     0.709302  198
max mean_per_class_accuracy  0.102119     0.717624  256
max tns                      0.533251     1203      0
max fns                      0.533251     187       0
max fps                      0.00120436   1204      399
max tps                      0.0384282    187       342
max tnr                      0.533251     0.999169  0
max fnr                      0.533251     1         0
max fpr                      0.00120436   1         399
max tpr                      0.0384282    1         342

Gains/Lift Table: Avg response rate: 13.44 %, avg score: 12.48 %
group    cumulative_data_fraction    lower_threshold    lift      cumulative_lift    response_rate    score       cumulative_response_rate    cumulative_score    capture_rate    cumulative_capture_rate    gain      cumulative_gain    kolmogorov_smirnov
-------  --------------------------  -----------------  --------  -----------------  ---------------  ----------  --------------------------  ------------------  --------------  -------------------------  --------  -----------------  --------------------
1        0.0100647                   0.412927           3.71925   3.71925            0.5              0.460919    0.5                         0.460919            0.0374332       0.0374332                  271.925   271.925            0.0316192
2        0.0201294                   0.375432           3.71925   3.71925            0.5              0.392074    0.5                         0.426496            0.0374332       0.0748663                  271.925   271.925            0.0632384
3        0.0301941                   0.356945           2.65661   3.36504            0.357143         0.365737    0.452381                    0.406243            0.026738        0.101604                   165.661   236.504            0.0825013
4        0.0402588                   0.337652           2.65661   3.18793            0.357143         0.346649    0.428571                    0.391345            0.026738        0.128342                   165.661   218.793            0.101764
5        0.0503235                   0.324432           2.65661   3.08167            0.357143         0.332038    0.414286                    0.379483            0.026738        0.15508                    165.661   208.167            0.121027
6        0.100647                    0.280163           2.65661   2.86914            0.357143         0.300632    0.385714                    0.340058            0.13369         0.28877                    165.661   186.914            0.217341
7        0.150252                    0.240036           2.80291   2.84727            0.376812         0.258992    0.382775                    0.313294            0.139037        0.427807                   180.291   184.727            0.320665
8        0.200575                    0.212231           2.23155   2.69279            0.3              0.226482    0.362007                    0.291513            0.112299        0.540107                   123.155   169.279            0.392266
9        0.300503                    0.164038           1.07029   2.15325            0.143885         0.187874    0.289474                    0.25705             0.106952        0.647059                   7.02882   115.325            0.400381
10       0.400431                    0.128804           1.23083   1.92306            0.165468         0.14672     0.258528                    0.229517            0.122995        0.770053                   23.0831   92.306             0.42703
11       0.500359                    0.0983532          0.963259  1.73138            0.129496         0.112624    0.232759                    0.206172            0.0962567       0.86631                    -3.67407  73.1376            0.422789
12       0.600288                    0.0694863          0.535144  1.53224            0.0719424        0.0846138   0.205988                    0.185936            0.0534759       0.919786                   -46.4856  53.2242            0.369122
13       0.700216                    0.0487778          0.588658  1.39758            0.0791367        0.059074    0.187885                    0.167832            0.0588235       0.97861                    -41.1342  39.7583            0.321633
14       0.800144                    0.031881           0.214058  1.24978            0.028777         0.0392844   0.168014                    0.151778            0.0213904       1                          -78.5942  24.9775            0.230897
15       0.900072                    0.0175179          0         1.11102            0                0.0239208   0.149361                    0.137583            0               1                          -100      11.1022            0.115449
16       1                           0.00103472         0         1                  0                0.00998578  0.134436                    0.124832            0               1                          -100      0                  0

Performing Logistic Regression

How it works ?
  • Single Model: When you use H2OGeneralizedLinearEstimator for logistic regression, it trains a single logistic regression model with the hyperparameters you provide (e.g., alpha, lambda, max_iterations).

  • No Ensemble: Unlike ensemble models (e.g., RandomForest), logistic regression fits only one decision boundary.It is a single, linear model.

  • No Automatic Model Search: The logistic regression model does not automatically search for the best hyperparameters or attempt different configurations.

  • Feature Importance (Coefficients): You can inspect the coefficients of the logistic regression model, which represent the weight or contribution of each feature in the model.

Hyperparameters for H2OGeneralizedLinearEstimator (LogisticRegression)
  • family: (default = “gaussian”) Defines the distribution family. For logistic regression, set to “binomial” for binary classification.

  • alpha: (default = 0.5) Controls the elastic net mixing. Values range from 0 to 1, where 0 corresponds to L2 regularization (Ridge), and 1 corresponds to L1 regularization (Lasso). A value of 0.5 balances between L1 and L2 regularization.

  • lambda: (default = 1.0) Regularization strength. Higher values apply stronger regularization, preventing overfitting. If set to 0, no regularization is applied.

  • standardize: (default = True) Standardizes input features to have zero mean and unit variance. This is important for regularized models.

  • max_iterations: (default = 50) Maximum number of iterations for convergence. This controls how long the algorithm will run before stopping.

  • early_stopping: (default = True) Stops training when the performance does not improve for a certain number of iterations.

  • lambda_search: (default = False) Whether to perform an automatic search over the lambda parameter. This is useful to find the best regularization strength.

  • solver: (default = “AUTO”) Specifies the algorithm used to solve the optimization problem. Options include:

  1. “AUTO”: Automatically selects the best solver.
  2. “IRLSM”: Iteratively Reweighted Least Squares Method (default for small datasets).
  3. “L_BFGS”: Limited-memory BFGS (default for large datasets).
  4. link: (default = “logit”) Defines the link function for the model. For logistic regression, it should be set to “logit”.
# Initialize LogisticRegression model
lr_model = H2OGeneralizedLinearEstimator(family="binomial")

# Train the LogisticRegression model
lr_model.train(x=features, y=target, training_frame=train)

#Get coefficients
coefficients = lr_model.coef()
print("Raw Coefficients: ", coefficients)
print("\n\n")

# Get normalized coefficients (scaled coefficients for better interpretability)
normalized_coefficients = lr_model.coef_norm()
print("Normalized Coefficients: ", normalized_coefficients)

# Evaluate model performance on the test set
performance_lr = lr_model.model_performance(test_data=test)
print(performance_lr)
glm Model Build progress: |██████████████████████████████████████████████████████| (done) 100%
Raw Coefficients:  {'Intercept': -1.6004399344664146, 'zone.1': -0.6307199485786326, 'zone.2': 0.9113673280991164, 'zone.3': 0.13752924917282494, 'zone.4': -0.4421435946651578, 'zone.5': 0.039492850483345956, 'zone.6': 0.07864836426025151, 'zone.7': 0.253720948610828, 'zone.8': -0.24765784270631763, 'zone.9': 0.30898300485036806, 'zone.10': -0.26330786565227293, 'zone.11': -0.3063836189490286, 'zone.12': 0.2242329982874504, 'zone.13': 0.0, 'zone.14': -0.18835161437722484, 'zone.15': 0.0, 'zone.16': -0.3882332555656693, 'zone.17': 0.03708080979579925, 'zone.18': 0.21148067483216285, 'zone.19': -0.15823225266014376, 'zone.20': -0.15627523899345952, 'age.1': 0.0073250069303451015, 'age.2': -0.05921981781311567, 'age.3': 0.0, 'house.1': -0.039774746445066365, 'house.2': 0.038799143492173, 'selfemp.1': -0.03210697729551859, 'selfemp.2': 0.03171784160840208, 'account.1': -0.07282092069041296, 'account.2': 0.07112372721861773, 'deposit.1': 0.03151802436293467, 'deposit.2': -0.03222612751665652, 'branch.1': 0.1467692770402181, 'branch.2': -0.1570763348849329, 'ref.1': 0.042996137742830356, 'ref.2': -0.04410121794483475, 'preloan.1': -0.010537253386504977, 'preloan.2': 0.010458345601614523, 'gender.1': -0.02729945000338843, 'gender.2': 0.02693155887202589, 'ms.1': 0.0722694921600014, 'ms.2': -0.07372568989661751, 'child.1': -0.0243973908274804, 'child.2': 0.024085535453080446, 'veh.1': 0.009528010259046475, 'veh.2': -0.009547502258109256, 'debtinc': 0.03491879124853984, 'creddebt': 0.18125661393625606, 'othdebt': -0.0022096278911967523, 'emp': -0.14458642627177878, 'address': -0.039022708230817826}



Normalized Coefficients:  {'Intercept': -2.2362428289352847, 'zone.1': -0.6307199485786326, 'zone.2': 0.9113673280991164, 'zone.3': 0.13752924917282494, 'zone.4': -0.4421435946651578, 'zone.5': 0.039492850483345956, 'zone.6': 0.07864836426025151, 'zone.7': 0.253720948610828, 'zone.8': -0.24765784270631763, 'zone.9': 0.30898300485036806, 'zone.10': -0.26330786565227293, 'zone.11': -0.3063836189490286, 'zone.12': 0.2242329982874504, 'zone.13': 0.0, 'zone.14': -0.18835161437722484, 'zone.15': 0.0, 'zone.16': -0.3882332555656693, 'zone.17': 0.03708080979579925, 'zone.18': 0.21148067483216285, 'zone.19': -0.15823225266014376, 'zone.20': -0.15627523899345952, 'age.1': 0.0073250069303451015, 'age.2': -0.05921981781311567, 'age.3': 0.0, 'house.1': -0.039774746445066365, 'house.2': 0.038799143492173, 'selfemp.1': -0.03210697729551859, 'selfemp.2': 0.03171784160840208, 'account.1': -0.07282092069041296, 'account.2': 0.07112372721861773, 'deposit.1': 0.03151802436293467, 'deposit.2': -0.03222612751665652, 'branch.1': 0.1467692770402181, 'branch.2': -0.1570763348849329, 'ref.1': 0.042996137742830356, 'ref.2': -0.04410121794483475, 'preloan.1': -0.010537253386504977, 'preloan.2': 0.010458345601614523, 'gender.1': -0.02729945000338843, 'gender.2': 0.02693155887202589, 'ms.1': 0.0722694921600014, 'ms.2': -0.07372568989661751, 'child.1': -0.0243973908274804, 'child.2': 0.024085535453080446, 'veh.1': 0.009528010259046475, 'veh.2': -0.009547502258109256, 'debtinc': 0.24419003514631252, 'creddebt': 0.4147372334973271, 'othdebt': -0.007365263276548585, 'emp': -0.9616941618328921, 'address': -0.26734646976546844}
ModelMetricsBinomialGLM: glm
** Reported on test data. **

MSE: 0.10677077505846692
RMSE: 0.32675797627367403
LogLoss: 0.3460216698792851
AUC: 0.7560715618171158
AUCPR: 0.27763042768753804
Gini: 0.5121431236342315
Null degrees of freedom: 1390
Residual degrees of freedom: 1343
Null deviance: 1098.2293480025876
Residual deviance: 962.6322856041712
AIC: 1058.632285604171

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.13263511232759034
       0    1    Error    Rate
-----  ---  ---  -------  --------------
0      813  391  0.3248   (391.0/1204.0)
1      54   133  0.2888   (54.0/187.0)
Total  867  524  0.3199   (445.0/1391.0)

Maximum Metrics: Maximum metrics at their respective thresholds
metric                       threshold    value     idx
---------------------------  -----------  --------  -----
max f1                       0.132635     0.374121  228
max f2                       0.0781303    0.543131  295
max f0point5                 0.24369      0.33396   126
max accuracy                 0.660393     0.864845  0
max precision                0.370842     0.415385  52
max recall                   0.0239179    1         368
max specificity              0.660393     0.999169  0
max absolute_mcc             0.120793     0.273908  241
max min_per_class_accuracy   0.137574     0.686047  223
max mean_per_class_accuracy  0.120793     0.697603  241
max tns                      0.660393     1203      0
max fns                      0.660393     187       0
max fps                      0.0016324    1204      399
max tps                      0.0239179    187       368
max tnr                      0.660393     0.999169  0
max fnr                      0.660393     1         0
max fpr                      0.0016324    1         399
max tpr                      0.0239179    1         368

Gains/Lift Table: Avg response rate: 13.44 %, avg score: 13.14 %
group    cumulative_data_fraction    lower_threshold    lift      cumulative_lift    response_rate    score      cumulative_response_rate    cumulative_score    capture_rate    cumulative_capture_rate    gain      cumulative_gain    kolmogorov_smirnov
-------  --------------------------  -----------------  --------  -----------------  ---------------  ---------  --------------------------  ------------------  --------------  -------------------------  --------  -----------------  --------------------
1        0.0100647                   0.534239           1.59396   1.59396            0.214286         0.572557   0.214286                    0.572557            0.0160428       0.0160428                  59.3965   59.3965            0.00690657
2        0.0201294                   0.442563           3.18793   2.39095            0.428571         0.478809   0.321429                    0.525683            0.0320856       0.0481283                  218.793   139.095            0.0323476
3        0.0301941                   0.413535           3.71925   2.83372            0.5              0.428298   0.380952                    0.493221            0.0374332       0.0855615                  271.925   183.372            0.0639668
4        0.0402588                   0.385764           2.65661   2.78944            0.357143         0.402488   0.375                       0.470538            0.026738        0.112299                   165.661   178.944            0.0832297
5        0.0503235                   0.355914           3.71925   2.9754             0.5              0.371127   0.4                         0.450656            0.0374332       0.149733                   271.925   197.54             0.114849
6        0.100647                    0.291107           1.80649   2.39095            0.242857         0.32098    0.321429                    0.385818            0.0909091       0.240642                   80.6494   139.095            0.161738
7        0.150252                    0.24679            2.3717    2.38459            0.318841         0.269793   0.320574                    0.347513            0.117647        0.358289                   137.17    138.459            0.240349
8        0.200575                    0.216596           1.70023   2.21289            0.228571         0.231779   0.297491                    0.318476            0.0855615       0.44385                    70.0229   121.289            0.28106
9        0.300503                    0.163949           1.33786   1.92191            0.179856         0.187985   0.258373                    0.275083            0.13369         0.57754                    33.786    92.191             0.320065
10       0.400431                    0.124159           1.55192   1.82958            0.208633         0.143201   0.245961                    0.242172            0.15508         0.73262                    55.1918   82.9578            0.383783
11       0.500359                    0.0977669          0.856231  1.63519            0.115108         0.110003   0.219828                    0.215776            0.0855615       0.818182                   -14.3769  63.5188            0.367185
12       0.600288                    0.0755793          0.963259  1.52333            0.129496         0.0861405  0.20479                     0.194196            0.0962567       0.914439                   -3.67407  52.3334            0.362943
13       0.700216                    0.0555595          0.588658  1.38995            0.0791367        0.06552    0.186858                    0.175833            0.0588235       0.973262                   -41.1342  38.9946            0.315455
14       0.800144                    0.0350893          0.160543  1.23641            0.0215827        0.0454838  0.166217                    0.159554            0.0160428       0.989305                   -83.9457  23.6409            0.218541
15       0.900072                    0.0187203          0.107029  1.11102            0.0143885        0.0267177  0.149361                    0.144806            0.0106952       1                          -89.2971  11.1022            0.115449
16       1                           0.0016324          0         1                  0                0.0109839  0.134436                    0.131433            0               1                          -100      0                  0
from h2o.estimators.glm import H2OGeneralizedLinearEstimator

# Initialize LogisticRegression model
lr_model = H2OGeneralizedLinearEstimator(family="binomial", alpha=0.5, lambda_=0.01, standardize=True, solver="L_BFGS")

# Train the LogisticRegression model
lr_model.train(x=features, y=target, training_frame=train)

#Get coefficients
coefficients = lr_model.coef()
print("Raw Coefficients: ", coefficients)
print("\n\n")

# Get normalized coefficients (scaled coefficients for better interpretability)
normalized_coefficients = lr_model.coef_norm()
print("Normalized Coefficients: ", normalized_coefficients)

# Evaluate model performance on the test set
performance_lr = lr_model.model_performance(test_data=test)
print(performance_lr)
glm Model Build progress: |██████████████████████████████████████████████████████| (done) 100%
Raw Coefficients:  {'Intercept': -1.6691520634414654, 'zone.1': 0.0, 'zone.2': 0.17532940188943535, 'zone.3': 0.0, 'zone.4': 0.0, 'zone.5': 0.0, 'zone.6': 0.0, 'zone.7': 0.0, 'zone.8': 0.0, 'zone.9': 0.0, 'zone.10': 0.0, 'zone.11': 0.0, 'zone.12': 0.0, 'zone.13': 0.0, 'zone.14': 0.0, 'zone.15': 0.0, 'zone.16': 0.0, 'zone.17': 0.0, 'zone.18': 0.0, 'zone.19': 0.0, 'zone.20': 0.0, 'age.1': 0.0, 'age.2': 0.0, 'age.3': 0.0, 'house.1': 0.0, 'house.2': 0.0, 'selfemp.1': 0.0, 'selfemp.2': 0.0, 'account.1': 0.0, 'account.2': 0.0, 'deposit.1': 0.0, 'deposit.2': 0.0, 'branch.1': 0.05732236887574749, 'branch.2': -0.057098899124051194, 'ref.1': 0.0, 'ref.2': 0.0, 'preloan.1': 0.0, 'preloan.2': 0.0, 'gender.1': 0.0, 'gender.2': 0.0, 'ms.1': 0.0, 'ms.2': 0.0, 'child.1': 0.0, 'child.2': 0.0, 'veh.1': 0.0, 'veh.2': 0.0, 'debtinc': 0.033472647089019016, 'creddebt': 0.11807593035780825, 'othdebt': 0.0, 'emp': -0.11289172521405828, 'address': -0.028321829923773047}



Normalized Coefficients:  {'Intercept': -2.1318189274905905, 'zone.1': 0.0, 'zone.2': 0.17532940188943535, 'zone.3': 0.0, 'zone.4': 0.0, 'zone.5': 0.0, 'zone.6': 0.0, 'zone.7': 0.0, 'zone.8': 0.0, 'zone.9': 0.0, 'zone.10': 0.0, 'zone.11': 0.0, 'zone.12': 0.0, 'zone.13': 0.0, 'zone.14': 0.0, 'zone.15': 0.0, 'zone.16': 0.0, 'zone.17': 0.0, 'zone.18': 0.0, 'zone.19': 0.0, 'zone.20': 0.0, 'age.1': 0.0, 'age.2': 0.0, 'age.3': 0.0, 'house.1': 0.0, 'house.2': 0.0, 'selfemp.1': 0.0, 'selfemp.2': 0.0, 'account.1': 0.0, 'account.2': 0.0, 'deposit.1': 0.0, 'deposit.2': 0.0, 'branch.1': 0.05732236887574749, 'branch.2': -0.057098899124051194, 'ref.1': 0.0, 'ref.2': 0.0, 'preloan.1': 0.0, 'preloan.2': 0.0, 'gender.1': 0.0, 'gender.2': 0.0, 'ms.1': 0.0, 'ms.2': 0.0, 'child.1': 0.0, 'child.2': 0.0, 'veh.1': 0.0, 'veh.2': 0.0, 'debtinc': 0.23407702777940398, 'creddebt': 0.27017212578208205, 'othdebt': 0.0, 'emp': -0.7508817795491343, 'address': -0.19403423264813324}
ModelMetricsBinomialGLM: glm
** Reported on test data. **

MSE: 0.1068477320999711
RMSE: 0.3268757135364619
LogLoss: 0.34835224598528186
AUC: 0.7572663314797378
AUCPR: 0.27029183283896796
Gini: 0.5145326629594755
Null degrees of freedom: 1390
Residual degrees of freedom: 1383
Null deviance: 1098.2293480025876
Residual deviance: 969.1159483310541
AIC: 985.1159483310541

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.15259066914618882
       0    1    Error    Rate
-----  ---  ---  -------  --------------
0      862  342  0.2841   (342.0/1204.0)
1      53   134  0.2834   (53.0/187.0)
Total  915  476  0.284    (395.0/1391.0)

Maximum Metrics: Maximum metrics at their respective thresholds
metric                       threshold    value     idx
---------------------------  -----------  --------  -----
max f1                       0.152591     0.404223  186
max f2                       0.12813      0.549853  222
max f0point5                 0.163157     0.321077  174
max accuracy                 0.544231     0.864845  0
max precision                0.287657     0.382353  51
max recall                   0.0528268    1         335
max specificity              0.544231     0.999169  0
max absolute_mcc             0.152591     0.310979  186
max min_per_class_accuracy   0.152591     0.715947  186
max mean_per_class_accuracy  0.152591     0.716262  186
max tns                      0.544231     1203      0
max fns                      0.544231     187       0
max fps                      0.00794667   1204      399
max tps                      0.0528268    187       335
max tnr                      0.544231     0.999169  0
max fnr                      0.544231     1         0
max fpr                      0.00794667   1         399
max tpr                      0.0528268    1         335

Gains/Lift Table: Avg response rate: 13.44 %, avg score: 13.07 %
group    cumulative_data_fraction    lower_threshold    lift       cumulative_lift    response_rate    score      cumulative_response_rate    cumulative_score    capture_rate    cumulative_capture_rate    gain      cumulative_gain    kolmogorov_smirnov
-------  --------------------------  -----------------  ---------  -----------------  ---------------  ---------  --------------------------  ------------------  --------------  -------------------------  --------  -----------------  --------------------
1        0.0100647                   0.399795           0.531322   0.531322           0.0714286        0.453613   0.0714286                   0.453613            0.00534759      0.00534759                 -46.8678  -46.8678           -0.00544975
2        0.0201294                   0.337394           4.25057    2.39095            0.571429         0.364437   0.321429                    0.409025            0.0427807       0.0481283                  325.057   139.095            0.0323476
3        0.0301941                   0.313935           3.18793    2.65661            0.428571         0.326165   0.357143                    0.381405            0.0320856       0.0802139                  218.793   165.661            0.0577887
4        0.0402588                   0.298437           2.65661    2.65661            0.357143         0.307709   0.357143                    0.362981            0.026738        0.106952                   165.661   165.661            0.0770515
5        0.0503235                   0.28357            3.18793    2.76287            0.428571         0.292093   0.371429                    0.348803            0.0320856       0.139037                   218.793   176.287            0.102493
6        0.100647                    0.246775           1.70023    2.23155            0.228571         0.263601   0.3                         0.306202            0.0855615       0.224599                   70.0229   123.155            0.143204
7        0.150252                    0.219894           1.83267    2.09986            0.246377         0.232602   0.282297                    0.281904            0.0909091       0.315508                   83.2675   109.986            0.190923
8        0.200575                    0.199525           1.80649    2.02626            0.242857         0.20882    0.272401                    0.263567            0.0909091       0.406417                   80.6494   102.626            0.237812
9        0.300503                    0.165897           2.24761    2.09986            0.302158         0.18189    0.282297                    0.236407            0.224599        0.631016                   124.761   109.986            0.381847
10       0.400431                    0.137122           1.23083    1.883              0.165468         0.150743   0.253142                    0.215029            0.122995        0.754011                   23.0831   88.2996            0.408496
11       0.500359                    0.113434           0.695687   1.64588            0.0935252        0.125474   0.221264                    0.197144            0.0695187       0.823529                   -30.4313  64.5876            0.373363
12       0.600288                    0.0935889          0.695687   1.4877             0.0935252        0.103295   0.2                         0.181521            0.0695187       0.893048                   -30.4313  48.7701            0.338231
13       0.700216                    0.0761042          0.749202   1.38231            0.100719         0.0852719  0.185832                    0.167785            0.0748663       0.967914                   -25.0798  38.2309            0.309277
14       0.800144                    0.0552309          0.267572   1.24309            0.0359712        0.0663865  0.167116                    0.155122            0.026738        0.994652                   -73.2428  24.3092            0.224719
15       0.900072                    0.032964           0.0535144  1.11102            0.00719424       0.0436912  0.149361                    0.142751            0.00534759      1                          -94.6486  11.1022            0.115449
16       1                           0.00794667         0          1                  0                0.0220876  0.134436                    0.130693            0               1                          -100      0                  0